Overview
Brought to you by YData
Dataset statistics
| Number of variables | 22 |
|---|---|
| Number of observations | 2785189 |
| Missing cells | 12252161 |
| Missing cells (%) | 20.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 430.3 MiB |
| Average record size in memory | 162.0 B |
Variable types
| Numeric | 10 |
|---|---|
| Unsupported | 2 |
| Text | 2 |
| DateTime | 4 |
| Categorical | 2 |
| Boolean | 2 |
IBNR is highly overall correlated with starting_station_IBNR | High correlation |
arrival_delay_m is highly overall correlated with departure_delay_m | High correlation |
departure_delay_m is highly overall correlated with arrival_delay_m | High correlation |
info is highly overall correlated with info_present and 4 other fields | High correlation |
info_present is highly overall correlated with info and 1 other fields | High correlation |
lat is highly overall correlated with info and 1 other fields | High correlation |
long is highly overall correlated with info | High correlation |
starting_station_IBNR is highly overall correlated with IBNR and 1 other fields | High correlation |
transformed_info_message is highly overall correlated with info and 1 other fields | High correlation |
zip is highly overall correlated with info and 2 other fields | High correlation |
canceled is highly imbalanced (61.2%) | Imbalance |
transformed_info_message is highly imbalanced (53.9%) | Imbalance |
last_station has 40796 (1.5%) missing values | Missing |
IBNR has 135873 (4.9%) missing values | Missing |
long has 142208 (5.1%) missing values | Missing |
lat has 142208 (5.1%) missing values | Missing |
arrival_plan has 1183672 (42.5%) missing values | Missing |
departure_plan has 972317 (34.9%) missing values | Missing |
arrival_change has 1422439 (51.1%) missing values | Missing |
departure_change has 1281468 (46.0%) missing values | Missing |
arrival_delay_m has 972317 (34.9%) missing values | Missing |
departure_delay_m has 972317 (34.9%) missing values | Missing |
info has 2201357 (79.0%) missing values | Missing |
clear_station_name has 2785189 (100.0%) missing values | Missing |
line is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
clear_station_name is an unsupported type, check if it needs cleaning or further analysis | Unsupported |
arrival_delay_m has 1256668 (45.1%) zeros | Zeros |
departure_delay_m has 1190749 (42.8%) zeros | Zeros |
Reproduction
| Analysis started | 2024-11-17 18:57:13.391912 |
|---|---|
| Analysis finished | 2024-11-17 18:59:41.505396 |
| Duration | 2 minutes and 28.11 seconds |
| Software version | ydata-profiling vv4.11.0 |
| Download configuration | config.json |
Variables
ID_Base
Real number (ℝ)
| Distinct | 50013 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -2.2534825 × 1016 |
| Minimum | -9.223177 × 1018 |
|---|---|
| Maximum | 9.223057 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 1399190 |
| Negative (%) | 50.2% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | -9.223177 × 1018 |
|---|---|
| 5-th percentile | -8.3285499 × 1018 |
| Q1 | -4.5909339 × 1018 |
| median | -4.62402 × 1016 |
| Q3 | 4.5625228 × 1018 |
| 95-th percentile | 8.331309 × 1018 |
| Maximum | 9.223057 × 1018 |
| Range | -5.101451 × 1014 |
| Interquartile range (IQR) | 9.1534567 × 1018 |
Descriptive statistics
| Standard deviation | 5.3245568 × 1018 |
|---|---|
| Coefficient of variation (CV) | -236.28126 |
| Kurtosis | -1.19281 |
| Mean | -2.2534825 × 1016 |
| Median Absolute Deviation (MAD) | 4.5737344 × 1018 |
| Skewness | 0.0086452172 |
| Sum | -7.9239271 × 1018 |
| Variance | 2.8350905 × 1037 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2.256484864 × 1018 | 413 | < 0.1% |
| 8.467202706 × 1018 | 413 | < 0.1% |
| -7.996941865 × 1018 | 412 | < 0.1% |
| 8.668076605 × 1018 | 391 | < 0.1% |
| -2.094717035 × 1018 | 378 | < 0.1% |
| 2.688663988 × 1018 | 361 | < 0.1% |
| -8.560851479 × 1018 | 350 | < 0.1% |
| -1.78380972 × 1017 | 350 | < 0.1% |
| 1.373163551 × 1018 | 350 | < 0.1% |
| 7.295358315 × 1018 | 350 | < 0.1% |
| Other values (50003) | 2781421 |
| Value | Count | Frequency (%) |
| -9.223176951 × 1018 | 10 | < 0.1% |
| -9.222914435 × 1018 | 7 | < 0.1% |
| -9.222740236 × 1018 | 2 | < 0.1% |
| -9.222707015 × 1018 | 5 | < 0.1% |
| -9.222587614 × 1018 | 17 | < 0.1% |
| -9.222235769 × 1018 | 49 | < 0.1% |
| -9.221813993 × 1018 | 210 | |
| -9.221685702 × 1018 | 7 | < 0.1% |
| -9.221229322 × 1018 | 25 | < 0.1% |
| -9.221103336 × 1018 | 98 |
| Value | Count | Frequency (%) |
| 9.223056978 × 1018 | 2 | < 0.1% |
| 9.221732242 × 1018 | 84 | |
| 9.221398198 × 1018 | 7 | < 0.1% |
| 9.221055243 × 1018 | 63 | |
| 9.220892138 × 1018 | 20 | < 0.1% |
| 9.22087069 × 1018 | 56 | < 0.1% |
| 9.220854484 × 1018 | 14 | < 0.1% |
| 9.219893508 × 1018 | 144 | |
| 9.219684671 × 1018 | 5 | < 0.1% |
| 9.219589171 × 1018 | 14 | < 0.1% |
ID_Timestamp
Real number (ℝ)
| Distinct | 10155 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.4071086 × 109 |
| Minimum | 2.4070319 × 109 |
|---|---|
| Maximum | 2.4071424 × 109 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 2.4070319 × 109 |
|---|---|
| 5-th percentile | 2.4070807 × 109 |
| Q1 | 2.4070914 × 109 |
| median | 2.4071109 × 109 |
| Q3 | 2.4071223 × 109 |
| 95-th percentile | 2.4071415 × 109 |
| Maximum | 2.4071424 × 109 |
| Range | 110502 |
| Interquartile range (IQR) | 30851 |
Descriptive statistics
| Standard deviation | 21614.602 |
|---|---|
| Coefficient of variation (CV) | 8.9794879 × 10-6 |
| Kurtosis | -0.0095195568 |
| Mean | 2.4071086 × 109 |
| Median Absolute Deviation (MAD) | 19390 |
| Skewness | -0.35481539 |
| Sum | 6.7042524 × 1015 |
| Variance | 4.6719104 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2407090833 | 924 | < 0.1% |
| 2407080833 | 921 | < 0.1% |
| 2407100833 | 909 | < 0.1% |
| 2407110833 | 909 | < 0.1% |
| 2407091633 | 905 | < 0.1% |
| 2407081633 | 899 | < 0.1% |
| 2407120833 | 898 | < 0.1% |
| 2407090733 | 896 | < 0.1% |
| 2407080733 | 895 | < 0.1% |
| 2407111633 | 891 | < 0.1% |
| Other values (10145) | 2776142 |
| Value | Count | Frequency (%) |
| 2407031857 | 3 | < 0.1% |
| 2407040236 | 24 | < 0.1% |
| 2407040245 | 11 | < 0.1% |
| 2407040253 | 13 | < 0.1% |
| 2407040302 | 19 | < 0.1% |
| 2407040303 | 6 | < 0.1% |
| 2407040312 | 22 | < 0.1% |
| 2407040313 | 37 | |
| 2407040314 | 6 | < 0.1% |
| 2407040317 | 67 |
| Value | Count | Frequency (%) |
| 2407142359 | 6 | < 0.1% |
| 2407142358 | 17 | |
| 2407142357 | 3 | < 0.1% |
| 2407142356 | 9 | < 0.1% |
| 2407142355 | 6 | < 0.1% |
| 2407142354 | 8 | < 0.1% |
| 2407142353 | 22 | |
| 2407142352 | 10 | < 0.1% |
| 2407142351 | 42 | |
| 2407142350 | 29 |
stop_number
Real number (ℝ)
| Distinct | 59 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.7907115 |
| Minimum | 1 |
|---|---|
| Maximum | 59 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 8 |
| Q3 | 14 |
| 95-th percentile | 25 |
| Maximum | 59 |
| Range | 58 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 7.5890828 |
|---|---|
| Coefficient of variation (CV) | 0.77513088 |
| Kurtosis | 0.7903732 |
| Mean | 9.7907115 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 1.0328565 |
| Sum | 27268982 |
| Variance | 57.594178 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 257818 | 9.3% |
| 2 | 206965 | 7.4% |
| 3 | 196635 | 7.1% |
| 4 | 187543 | 6.7% |
| 5 | 175516 | 6.3% |
| 6 | 162330 | 5.8% |
| 7 | 150002 | 5.4% |
| 8 | 139863 | 5.0% |
| 9 | 130318 | 4.7% |
| 10 | 120314 | 4.3% |
| Other values (49) | 1057885 |
| Value | Count | Frequency (%) |
| 1 | 257818 | |
| 2 | 206965 | |
| 3 | 196635 | |
| 4 | 187543 | |
| 5 | 175516 | |
| 6 | 162330 | |
| 7 | 150002 | |
| 8 | 139863 | |
| 9 | 130318 | |
| 10 | 120314 |
| Value | Count | Frequency (%) |
| 59 | 36 | < 0.1% |
| 58 | 36 | < 0.1% |
| 57 | 36 | < 0.1% |
| 56 | 36 | < 0.1% |
| 55 | 36 | < 0.1% |
| 54 | 45 | < 0.1% |
| 53 | 62 | |
| 52 | 62 | |
| 51 | 76 | |
| 50 | 146 |
line
Unsupported
Rejected  Unsupported 
| Missing | 0 |
|---|---|
| Missing (%) | 0.0% |
| Memory size | 21.2 MiB |
starting_station_IBNR
Real number (ℝ)
High correlation 
| Distinct | 1733 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8016680.2 |
| Minimum | 8000001 |
|---|---|
| Maximum | 8098360 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 8000001 |
|---|---|
| 5-th percentile | 8000106 |
| Q1 | 8001864 |
| median | 8004241 |
| Q3 | 8010226 |
| 95-th percentile | 8089078 |
| Maximum | 8098360 |
| Range | 98359 |
| Interquartile range (IQR) | 8362 |
Descriptive statistics
| Standard deviation | 29613.401 |
|---|---|
| Coefficient of variation (CV) | 0.0036939731 |
| Kurtosis | 1.8030415 |
| Mean | 8016680.2 |
| Median Absolute Deviation (MAD) | 2952 |
| Skewness | 1.9195194 |
| Sum | 2.232797 × 1013 |
| Variance | 8.7695352 × 108 |
| Monotonicity | Increasing |
| Value | Count | Frequency (%) |
| 8003184 | 38109 | 1.4% |
| 8089116 | 35053 | 1.3% |
| 8089111 | 34104 | 1.2% |
| 8005106 | 28776 | 1.0% |
| 8006750 | 26423 | 0.9% |
| 8089078 | 24706 | 0.9% |
| 8006319 | 24370 | 0.9% |
| 8089053 | 23509 | 0.8% |
| 8006404 | 23150 | 0.8% |
| 8089022 | 21631 | 0.8% |
| Other values (1723) | 2505358 |
| Value | Count | Frequency (%) |
| 8000001 | 181 | < 0.1% |
| 8000002 | 693 | < 0.1% |
| 8000004 | 3447 | |
| 8000007 | 821 | < 0.1% |
| 8000009 | 897 | < 0.1% |
| 8000010 | 1624 | |
| 8000011 | 71 | < 0.1% |
| 8000012 | 1731 | |
| 8000013 | 2491 | |
| 8000014 | 359 | < 0.1% |
| Value | Count | Frequency (%) |
| 8098360 | 153 | < 0.1% |
| 8089537 | 469 | < 0.1% |
| 8089474 | 283 | < 0.1% |
| 8089473 | 103 | < 0.1% |
| 8089472 | 16127 | |
| 8089330 | 283 | < 0.1% |
| 8089329 | 1416 | 0.1% |
| 8089328 | 10 | < 0.1% |
| 8089131 | 99 | < 0.1% |
| 8089118 | 235 | < 0.1% |
city
Text
| Distinct | 1173 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 21.2 MiB |
Length
| Max length | 25 |
|---|---|
| Median length | 22 |
| Mean length | 9.327129 |
| Min length | 3 |
Unique
| Unique | 9 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Aachen |
|---|---|
| 2nd row | Aachen |
| 3rd row | Aachen |
| 4th row | Aachen |
| 5th row | Aachen |
| Value | Count | Frequency (%) |
| berlin | 329373 | 10.1% |
| hamburg | 162071 | 5.0% |
| am | 61714 | 1.9% |
| main | 52698 | 1.6% |
| münchen | 52664 | 1.6% |
| bad | 46686 | 1.4% |
| frankfurt | 42841 | 1.3% |
| karlsruhe | 41627 | 1.3% |
| düsseldorf | 40213 | 1.2% |
| dortmund | 35100 | 1.1% |
| Other values (1230) | 2397081 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 3092527 | 11.9% |
| n | 2365863 | 9.1% |
| r | 2222115 | 8.6% |
| a | 1671183 | 6.4% |
| i | 1530071 | 5.9% |
| l | 1133582 | 4.4% |
| t | 993122 | 3.8% |
| s | 958082 | 3.7% |
| h | 953468 | 3.7% |
| u | 941127 | 3.6% |
| Other values (50) | 10116677 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 25977817 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 3092527 | 11.9% |
| n | 2365863 | 9.1% |
| r | 2222115 | 8.6% |
| a | 1671183 | 6.4% |
| i | 1530071 | 5.9% |
| l | 1133582 | 4.4% |
| t | 993122 | 3.8% |
| s | 958082 | 3.7% |
| h | 953468 | 3.7% |
| u | 941127 | 3.6% |
| Other values (50) | 10116677 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 25977817 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 3092527 | 11.9% |
| n | 2365863 | 9.1% |
| r | 2222115 | 8.6% |
| a | 1671183 | 6.4% |
| i | 1530071 | 5.9% |
| l | 1133582 | 4.4% |
| t | 993122 | 3.8% |
| s | 958082 | 3.7% |
| h | 953468 | 3.7% |
| u | 941127 | 3.6% |
| Other values (50) | 10116677 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 25977817 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 3092527 | 11.9% |
| n | 2365863 | 9.1% |
| r | 2222115 | 8.6% |
| a | 1671183 | 6.4% |
| i | 1530071 | 5.9% |
| l | 1133582 | 4.4% |
| t | 993122 | 3.8% |
| s | 958082 | 3.7% |
| h | 953468 | 3.7% |
| u | 941127 | 3.6% |
| Other values (50) | 10116677 |
zip
Real number (ℝ)
High correlation 
| Distinct | 1498 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 47364.088 |
| Minimum | 1067 |
|---|---|
| Maximum | 99974 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 1067 |
|---|---|
| 5-th percentile | 4838 |
| Q1 | 16540 |
| median | 49074 |
| Q3 | 74177 |
| 95-th percentile | 90574 |
| Maximum | 99974 |
| Range | 98907 |
| Interquartile range (IQR) | 57637 |
Descriptive statistics
| Standard deviation | 28885.385 |
|---|---|
| Coefficient of variation (CV) | 0.60985836 |
| Kurtosis | -1.3897096 |
| Mean | 47364.088 |
| Median Absolute Deviation (MAD) | 27153 |
| Skewness | 0.013123773 |
| Sum | 1.3191794 × 1011 |
| Variance | 8.3436547 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 76227 | 38109 | 1.4% |
| 13353 | 35053 | 1.3% |
| 14059 | 34104 | 1.2% |
| 85354 | 31932 | 1.1% |
| 22559 | 28776 | 1.0% |
| 21147 | 26423 | 0.9% |
| 22391 | 25418 | 0.9% |
| 13597 | 25406 | 0.9% |
| 14129 | 24710 | 0.9% |
| 65203 | 23150 | 0.8% |
| Other values (1488) | 2492108 |
| Value | Count | Frequency (%) |
| 1067 | 10181 | |
| 1069 | 33 | < 0.1% |
| 1097 | 589 | < 0.1% |
| 1109 | 2723 | 0.1% |
| 1129 | 879 | < 0.1% |
| 1159 | 406 | < 0.1% |
| 1187 | 5255 | |
| 1219 | 245 | < 0.1% |
| 1237 | 50 | < 0.1% |
| 1445 | 8 | < 0.1% |
| Value | Count | Frequency (%) |
| 99974 | 588 | < 0.1% |
| 99947 | 672 | < 0.1% |
| 99880 | 3499 | |
| 99867 | 76 | < 0.1% |
| 99817 | 296 | < 0.1% |
| 99752 | 621 | < 0.1% |
| 99734 | 356 | < 0.1% |
| 99610 | 1891 | |
| 99518 | 8 | < 0.1% |
| 99510 | 5 | < 0.1% |
last_station
Text
Missing 
| Distinct | 5870 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 40796 |
| Missing (%) | 1.5% |
| Memory size | 21.2 MiB |
Length
| Max length | 50 |
|---|---|
| Median length | 41 |
| Mean length | 15.764374 |
| Min length | 3 |
Unique
| Unique | 70 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | stolberg(rheinl)hbf gl.44 |
|---|---|
| 2nd row | eschweiler-st.jöris |
| 3rd row | alsdorf poststraße |
| 4th row | alsdorf-mariadorf |
| 5th row | alsdorf-kellersberg |
| Value | Count | Frequency (%) |
| berlin | 288558 | 6.9% |
| hbf | 185604 | 4.4% |
| hamburg | 93284 | 2.2% |
| münchen | 83457 | 2.0% |
| s | 81299 | 1.9% |
| bad | 34406 | 0.8% |
| karlsruhe | 33118 | 0.8% |
| stuttgart | 29159 | 0.7% |
| ost | 26994 | 0.6% |
| leipzig | 26630 | 0.6% |
| Other values (5818) | 3323763 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 5119053 | 11.8% |
| r | 3751615 | 8.7% |
| n | 3516766 | 8.1% |
| a | 2621122 | 6.1% |
| s | 2340810 | 5.4% |
| h | 2337235 | 5.4% |
| l | 2272648 | 5.3% |
| i | 2148228 | 5.0% |
| t | 2082827 | 4.8% |
| b | 2028706 | 4.7% |
| Other values (38) | 15044627 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 43263637 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 5119053 | 11.8% |
| r | 3751615 | 8.7% |
| n | 3516766 | 8.1% |
| a | 2621122 | 6.1% |
| s | 2340810 | 5.4% |
| h | 2337235 | 5.4% |
| l | 2272648 | 5.3% |
| i | 2148228 | 5.0% |
| t | 2082827 | 4.8% |
| b | 2028706 | 4.7% |
| Other values (38) | 15044627 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 43263637 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 5119053 | 11.8% |
| r | 3751615 | 8.7% |
| n | 3516766 | 8.1% |
| a | 2621122 | 6.1% |
| s | 2340810 | 5.4% |
| h | 2337235 | 5.4% |
| l | 2272648 | 5.3% |
| i | 2148228 | 5.0% |
| t | 2082827 | 4.8% |
| b | 2028706 | 4.7% |
| Other values (38) | 15044627 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 43263637 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 5119053 | 11.8% |
| r | 3751615 | 8.7% |
| n | 3516766 | 8.1% |
| a | 2621122 | 6.1% |
| s | 2340810 | 5.4% |
| h | 2337235 | 5.4% |
| l | 2272648 | 5.3% |
| i | 2148228 | 5.0% |
| t | 2082827 | 4.8% |
| b | 2028706 | 4.7% |
| Other values (38) | 15044627 |
IBNR
Real number (ℝ)
High correlation  Missing 
| Distinct | 5264 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 135873 |
| Missing (%) | 4.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8019176.6 |
| Minimum | 8000001 |
|---|---|
| Maximum | 8099506 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 8000001 |
|---|---|
| 5-th percentile | 8000191 |
| Q1 | 8002047 |
| median | 8004440 |
| Q3 | 8011426 |
| 95-th percentile | 8089090 |
| Maximum | 8099506 |
| Range | 99505 |
| Interquartile range (IQR) | 9379 |
Descriptive statistics
| Standard deviation | 32023.558 |
|---|---|
| Coefficient of variation (CV) | 0.0039933723 |
| Kurtosis | 0.92620159 |
| Mean | 8019176.6 |
| Median Absolute Deviation (MAD) | 2945 |
| Skewness | 1.6805098 |
| Sum | 2.1245333 × 1013 |
| Variance | 1.0255083 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8089028 | 10583 | 0.4% |
| 8098549 | 8892 | 0.3% |
| 8004128 | 8825 | 0.3% |
| 8089015 | 7451 | 0.3% |
| 8004132 | 7446 | 0.3% |
| 8004135 | 7444 | 0.3% |
| 8004129 | 7444 | 0.3% |
| 8098263 | 7442 | 0.3% |
| 8004131 | 7441 | 0.3% |
| 8004136 | 7430 | 0.3% |
| Other values (5254) | 2568918 | |
| (Missing) | 135873 | 4.9% |
| Value | Count | Frequency (%) |
| 8000001 | 614 | |
| 8000002 | 118 | < 0.1% |
| 8000004 | 853 | |
| 8000007 | 473 | |
| 8000009 | 530 | |
| 8000010 | 660 | |
| 8000011 | 593 | |
| 8000012 | 587 | |
| 8000013 | 1099 | |
| 8000014 | 679 |
| Value | Count | Frequency (%) |
| 8099506 | 225 | < 0.1% |
| 8098553 | 5223 | |
| 8098549 | 8892 | |
| 8098360 | 33 | < 0.1% |
| 8098348 | 225 | < 0.1% |
| 8098263 | 7442 | |
| 8098205 | 3097 | 0.1% |
| 8098193 | 542 | < 0.1% |
| 8098147 | 3084 | 0.1% |
| 8098105 | 5725 |
long
Real number (ℝ)
High correlation  Missing 
| Distinct | 3184 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 142208 |
| Missing (%) | 5.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.148847 |
| Minimum | 0.834032 |
|---|---|
| Maximum | 14.982644 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 0.834032 |
|---|---|
| 5-th percentile | 6.851719 |
| Q1 | 8.364945 |
| median | 9.902741 |
| Q3 | 12.130664 |
| 95-th percentile | 13.54746 |
| Maximum | 14.982644 |
| Range | 14.148612 |
| Interquartile range (IQR) | 3.765719 |
Descriptive statistics
| Standard deviation | 2.2967626 |
|---|---|
| Coefficient of variation (CV) | 0.22630774 |
| Kurtosis | -1.1021461 |
| Mean | 10.148847 |
| Median Absolute Deviation (MAD) | 1.755656 |
| Skewness | 0.13274197 |
| Sum | 26823209 |
| Variance | 5.2751183 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11.536537 | 8022 | 0.3% |
| 13.283966 | 7881 | 0.3% |
| 11.575386 | 7373 | 0.3% |
| 11.583234 | 7368 | 0.3% |
| 11.548572 | 7363 | 0.3% |
| 11.565619 | 7329 | 0.3% |
| 11.593049 | 6923 | 0.2% |
| 11.604971 | 6636 | 0.2% |
| 11.519245 | 6549 | 0.2% |
| 13.451646 | 6508 | 0.2% |
| Other values (3174) | 2571029 | |
| (Missing) | 142208 | 5.1% |
| Value | Count | Frequency (%) |
| 0.834032 | 709 | |
| 0.896632 | 730 | |
| 6.070715 | 1431 | |
| 6.07384 | 897 | |
| 6.074485 | 1049 | |
| 6.08378 | 734 | |
| 6.091499 | 1483 | |
| 6.094486 | 1286 | |
| 6.097265 | 810 | |
| 6.098877 | 740 |
| Value | Count | Frequency (%) |
| 14.982644 | 721 | |
| 14.97908 | 421 | |
| 14.936008 | 1 | < 0.1% |
| 14.930408 | 720 | |
| 14.902088 | 248 | < 0.1% |
| 14.889318 | 698 | |
| 14.825531 | 666 | |
| 14.825234 | 764 | |
| 14.805774 | 269 | < 0.1% |
| 14.706775 | 299 | < 0.1% |
lat
Real number (ℝ)
High correlation  Missing 
| Distinct | 3191 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 142208 |
| Missing (%) | 5.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.950548 |
| Minimum | 47.411032 |
|---|---|
| Maximum | 55.021381 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 47.411032 |
|---|---|
| 5-th percentile | 48.043452 |
| Q1 | 49.379389 |
| median | 51.037414 |
| Q3 | 52.493827 |
| 95-th percentile | 53.918621 |
| Maximum | 55.021381 |
| Range | 7.610349 |
| Interquartile range (IQR) | 3.114438 |
Descriptive statistics
| Standard deviation | 1.9056685 |
|---|---|
| Coefficient of variation (CV) | 0.037402316 |
| Kurtosis | -0.95636919 |
| Mean | 50.950548 |
| Median Absolute Deviation (MAD) | 1.479092 |
| Skewness | 0.0059574693 |
| Sum | 1.3466133 × 108 |
| Variance | 3.6315725 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 48.142623 | 8022 | 0.3% |
| 52.500737 | 7881 | 0.3% |
| 48.137048 | 7373 | 0.3% |
| 48.134202 | 7368 | 0.3% |
| 48.141969 | 7363 | 0.3% |
| 48.139452 | 7329 | 0.3% |
| 48.129168 | 6923 | 0.2% |
| 48.12744 | 6636 | 0.2% |
| 48.14354 | 6549 | 0.2% |
| 52.505976 | 6508 | 0.2% |
| Other values (3181) | 2571029 | |
| (Missing) | 142208 | 5.1% |
| Value | Count | Frequency (%) |
| 47.411032 | 221 | < 0.1% |
| 47.4179544 | 705 | |
| 47.44003 | 188 | < 0.1% |
| 47.456591 | 214 | < 0.1% |
| 47.491452 | 398 | < 0.1% |
| 47.5058367 | 1419 | |
| 47.513241 | 434 | < 0.1% |
| 47.5251713 | 720 | |
| 47.543785 | 719 | |
| 47.544341 | 515 | < 0.1% |
| Value | Count | Frequency (%) |
| 55.021381 | 731 | |
| 55.019862 | 713 | |
| 55.017947 | 748 | |
| 55.01765 | 718 | |
| 55.0149 | 734 | |
| 55.012455 | 725 | |
| 55.010432 | 736 | |
| 55.008077 | 709 | |
| 55.001937 | 726 | |
| 54.988543 | 765 |
arrival_plan
Date
Missing 
| Distinct | 10081 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 1183672 |
| Missing (%) | 42.5% |
| Memory size | 21.2 MiB |
| Minimum | 2024-07-07 23:37:00 |
|---|---|
| Maximum | 2024-07-14 23:58:00 |
departure_plan
Date
Missing 
| Distinct | 10080 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 972317 |
| Missing (%) | 34.9% |
| Memory size | 21.2 MiB |
| Minimum | 2024-07-08 00:00:00 |
|---|---|
| Maximum | 2024-07-14 23:59:00 |
arrival_change
Date
Missing 
| Distinct | 10112 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 1422439 |
| Missing (%) | 51.1% |
| Memory size | 21.2 MiB |
| Minimum | 2024-07-07 23:39:00 |
|---|---|
| Maximum | 2024-07-15 01:00:00 |
departure_change
Date
Missing 
| Distinct | 10107 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 1281468 |
| Missing (%) | 46.0% |
| Memory size | 21.2 MiB |
| Minimum | 2024-07-08 00:00:00 |
|---|---|
| Maximum | 2024-07-15 01:00:00 |
arrival_delay_m
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 110 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 972317 |
| Missing (%) | 34.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.1088477 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1256668 |
| Zeros (%) | 45.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 5 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.2636363 |
|---|---|
| Coefficient of variation (CV) | 2.9432682 |
| Kurtosis | 110.72395 |
| Mean | 1.1088477 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.8255256 |
| Sum | 2010199 |
| Variance | 10.651322 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1256668 | |
| 1 | 218485 | 7.8% |
| 2 | 113139 | 4.1% |
| 3 | 69418 | 2.5% |
| 4 | 38750 | 1.4% |
| 5 | 26192 | 0.9% |
| 6 | 18044 | 0.6% |
| 7 | 12782 | 0.5% |
| 8 | 10071 | 0.4% |
| 9 | 8127 | 0.3% |
| Other values (100) | 41196 | 1.5% |
| (Missing) | 972317 |
| Value | Count | Frequency (%) |
| 0 | 1256668 | |
| 1 | 218485 | 7.8% |
| 2 | 113139 | 4.1% |
| 3 | 69418 | 2.5% |
| 4 | 38750 | 1.4% |
| 5 | 26192 | 0.9% |
| 6 | 18044 | 0.6% |
| 7 | 12782 | 0.5% |
| 8 | 10071 | 0.4% |
| 9 | 8127 | 0.3% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 157 | 1 | < 0.1% |
| 140 | 1 | < 0.1% |
| 136 | 1 | < 0.1% |
| 134 | 1 | < 0.1% |
| 133 | 2 | < 0.1% |
| 120 | 1 | < 0.1% |
| 117 | 1 | < 0.1% |
| 116 | 1 | < 0.1% |
| 110 | 7 |
departure_delay_m
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 113 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 972317 |
| Missing (%) | 34.9% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.1712868 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1190749 |
| Zeros (%) | 42.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 21.2 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 6 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.3009847 |
|---|---|
| Coefficient of variation (CV) | 2.8182549 |
| Kurtosis | 109.1751 |
| Mean | 1.1712868 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.7631659 |
| Sum | 2123393 |
| Variance | 10.8965 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1190749 | |
| 1 | 264731 | 9.5% |
| 2 | 128372 | 4.6% |
| 3 | 70583 | 2.5% |
| 4 | 39753 | 1.4% |
| 5 | 26517 | 1.0% |
| 6 | 18206 | 0.7% |
| 7 | 12989 | 0.5% |
| 8 | 10239 | 0.4% |
| 9 | 8183 | 0.3% |
| Other values (103) | 42550 | 1.5% |
| (Missing) | 972317 |
| Value | Count | Frequency (%) |
| 0 | 1190749 | |
| 1 | 264731 | 9.5% |
| 2 | 128372 | 4.6% |
| 3 | 70583 | 2.5% |
| 4 | 39753 | 1.4% |
| 5 | 26517 | 1.0% |
| 6 | 18206 | 0.7% |
| 7 | 12989 | 0.5% |
| 8 | 10239 | 0.4% |
| 9 | 8183 | 0.3% |
| Value | Count | Frequency (%) |
| 159 | 1 | |
| 156 | 1 | |
| 137 | 1 | |
| 135 | 1 | |
| 134 | 2 | |
| 133 | 1 | |
| 132 | 1 | |
| 120 | 1 | |
| 117 | 1 | |
| 115 | 1 |
info
Categorical
High correlation  Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 2201357 |
| Missing (%) | 79.0% |
| Memory size | 21.2 MiB |
| Information | |
|---|---|
| Störung | |
| Bauarbeiten | |
| Information. (Quelle: zuginfo.nrw) | |
| Bauarbeiten. (Quelle: zuginfo.nrw) | |
| Other values (2) |
Length
| Max length | 34 |
|---|---|
| Median length | 11 |
| Mean length | 16.518603 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Störung. (Quelle: zuginfo.nrw) |
|---|---|
| 2nd row | Störung. (Quelle: zuginfo.nrw) |
| 3rd row | Störung. (Quelle: zuginfo.nrw) |
| 4th row | Störung. (Quelle: zuginfo.nrw) |
| 5th row | Information |
Common Values
| Value | Count | Frequency (%) |
| Information | 219535 | 7.9% |
| Störung | 105210 | 3.8% |
| Bauarbeiten | 88958 | 3.2% |
| Information. (Quelle: zuginfo.nrw) | 73213 | 2.6% |
| Bauarbeiten. (Quelle: zuginfo.nrw) | 64167 | 2.3% |
| Störung. (Quelle: zuginfo.nrw) | 25423 | 0.9% |
| Großstörung | 7326 | 0.3% |
| (Missing) | 2201357 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| information | 292748 | |
| quelle | 162803 | |
| zuginfo.nrw | 162803 | |
| bauarbeiten | 153125 | |
| störung | 130633 | |
| großstörung | 7326 | 0.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 1202186 | 12.5% |
| o | 755625 | 7.8% |
| r | 753961 | 7.8% |
| e | 631856 | 6.6% |
| u | 616690 | 6.4% |
| i | 608676 | 6.3% |
| a | 598998 | 6.2% |
| t | 583832 | 6.1% |
| f | 455551 | 4.7% |
| l | 325606 | 3.4% |
| Other values (18) | 3111108 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 9644089 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| n | 1202186 | 12.5% |
| o | 755625 | 7.8% |
| r | 753961 | 7.8% |
| e | 631856 | 6.6% |
| u | 616690 | 6.4% |
| i | 608676 | 6.3% |
| a | 598998 | 6.2% |
| t | 583832 | 6.1% |
| f | 455551 | 4.7% |
| l | 325606 | 3.4% |
| Other values (18) | 3111108 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 9644089 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| n | 1202186 | 12.5% |
| o | 755625 | 7.8% |
| r | 753961 | 7.8% |
| e | 631856 | 6.6% |
| u | 616690 | 6.4% |
| i | 608676 | 6.3% |
| a | 598998 | 6.2% |
| t | 583832 | 6.1% |
| f | 455551 | 4.7% |
| l | 325606 | 3.4% |
| Other values (18) | 3111108 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 9644089 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| n | 1202186 | 12.5% |
| o | 755625 | 7.8% |
| r | 753961 | 7.8% |
| e | 631856 | 6.6% |
| u | 616690 | 6.4% |
| i | 608676 | 6.3% |
| a | 598998 | 6.2% |
| t | 583832 | 6.1% |
| f | 455551 | 4.7% |
| l | 325606 | 3.4% |
| Other values (18) | 3111108 |
canceled
Boolean
Imbalance 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.7 MiB |
| False | |
|---|---|
| True | 211355 |
| Value | Count | Frequency (%) |
| False | 2573834 | |
| True | 211355 | 7.6% |
info_present
Boolean
High correlation 
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 2.7 MiB |
| False | |
|---|---|
| True |
| Value | Count | Frequency (%) |
| False | 2201357 | |
| True | 583832 | 21.0% |
transformed_info_message
Categorical
High correlation  Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 21.2 MiB |
| No message | |
|---|---|
| Information | |
| Bauarbeiten | 153125 |
| Störung | 130633 |
| Großstörung | 7326 |
Length
| Max length | 11 |
|---|---|
| Median length | 10 |
| Mean length | 10.022009 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | No message |
|---|---|
| 2nd row | No message |
| 3rd row | No message |
| 4th row | No message |
| 5th row | No message |
Common Values
| Value | Count | Frequency (%) |
| No message | 2201357 | |
| Information | 292748 | 10.5% |
| Bauarbeiten | 153125 | 5.5% |
| Störung | 130633 | 4.7% |
| Großstörung | 7326 | 0.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| no | 2201357 | |
| message | 2201357 | |
| information | 292748 | 5.9% |
| bauarbeiten | 153125 | 3.1% |
| störung | 130633 | 2.6% |
| großstörung | 7326 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 4708964 | |
| s | 4410040 | |
| a | 2800355 | |
| o | 2794179 | |
| m | 2494105 | |
| g | 2339316 | |
| N | 2201357 | |
| 2201357 | ||
| n | 876580 | 3.1% |
| r | 591158 | 2.1% |
| Other values (11) | 2495779 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 27913190 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 4708964 | |
| s | 4410040 | |
| a | 2800355 | |
| o | 2794179 | |
| m | 2494105 | |
| g | 2339316 | |
| N | 2201357 | |
| 2201357 | ||
| n | 876580 | 3.1% |
| r | 591158 | 2.1% |
| Other values (11) | 2495779 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 27913190 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 4708964 | |
| s | 4410040 | |
| a | 2800355 | |
| o | 2794179 | |
| m | 2494105 | |
| g | 2339316 | |
| N | 2201357 | |
| 2201357 | ||
| n | 876580 | 3.1% |
| r | 591158 | 2.1% |
| Other values (11) | 2495779 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 27913190 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 4708964 | |
| s | 4410040 | |
| a | 2800355 | |
| o | 2794179 | |
| m | 2494105 | |
| g | 2339316 | |
| N | 2201357 | |
| 2201357 | ||
| n | 876580 | 3.1% |
| r | 591158 | 2.1% |
| Other values (11) | 2495779 |
clear_station_name
Unsupported
Missing  Rejected  Unsupported 
| Missing | 2785189 |
|---|---|
| Missing (%) | 100.0% |
| Memory size | 21.2 MiB |
Interactions
Correlations
| IBNR | ID_Base | ID_Timestamp | arrival_delay_m | canceled | departure_delay_m | info | info_present | lat | long | starting_station_IBNR | stop_number | transformed_info_message | zip | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IBNR | 1.000 | -0.004 | 0.001 | -0.117 | 0.091 | -0.121 | 0.323 | 0.212 | 0.254 | 0.472 | 0.662 | 0.138 | 0.148 | -0.497 |
| ID_Base | -0.004 | 1.000 | -0.001 | -0.001 | 0.003 | -0.001 | 0.031 | 0.012 | 0.002 | 0.003 | -0.009 | -0.002 | 0.013 | 0.005 |
| ID_Timestamp | 0.001 | -0.001 | 1.000 | -0.024 | 0.021 | -0.023 | 0.080 | 0.038 | 0.005 | 0.003 | 0.001 | 0.002 | 0.039 | -0.006 |
| arrival_delay_m | -0.117 | -0.001 | -0.024 | 1.000 | 0.036 | 0.823 | 0.027 | 0.012 | -0.264 | -0.110 | -0.147 | 0.317 | 0.010 | 0.242 |
| canceled | 0.091 | 0.003 | 0.021 | 0.036 | 1.000 | 0.021 | 0.071 | 0.012 | 0.071 | 0.061 | 0.073 | 0.333 | 0.022 | 0.067 |
| departure_delay_m | -0.121 | -0.001 | -0.023 | 0.823 | 0.021 | 1.000 | 0.027 | 0.012 | -0.283 | -0.115 | -0.158 | 0.273 | 0.010 | 0.261 |
| info | 0.323 | 0.031 | 0.080 | 0.027 | 0.071 | 0.027 | 1.000 | 1.000 | 0.528 | 0.553 | 0.301 | 0.113 | 1.000 | 0.571 |
| info_present | 0.212 | 0.012 | 0.038 | 0.012 | 0.012 | 0.012 | 1.000 | 1.000 | 0.172 | 0.192 | 0.189 | 0.099 | 1.000 | 0.224 |
| lat | 0.254 | 0.002 | 0.005 | -0.264 | 0.071 | -0.283 | 0.528 | 0.172 | 1.000 | 0.214 | 0.267 | 0.002 | 0.202 | -0.540 |
| long | 0.472 | 0.003 | 0.003 | -0.110 | 0.061 | -0.115 | 0.553 | 0.192 | 0.214 | 1.000 | 0.440 | 0.057 | 0.202 | -0.260 |
| starting_station_IBNR | 0.662 | -0.009 | 0.001 | -0.147 | 0.073 | -0.158 | 0.301 | 0.189 | 0.267 | 0.440 | 1.000 | 0.140 | 0.133 | -0.574 |
| stop_number | 0.138 | -0.002 | 0.002 | 0.317 | 0.333 | 0.273 | 0.113 | 0.099 | 0.002 | 0.057 | 0.140 | 1.000 | 0.060 | -0.043 |
| transformed_info_message | 0.148 | 0.013 | 0.039 | 0.010 | 0.022 | 0.010 | 1.000 | 1.000 | 0.202 | 0.202 | 0.133 | 0.060 | 1.000 | 0.234 |
| zip | -0.497 | 0.005 | -0.006 | 0.242 | 0.067 | 0.261 | 0.571 | 0.224 | -0.540 | -0.260 | -0.574 | -0.043 | 0.234 | 1.000 |
Missing values
Sample
| ID_Base | ID_Timestamp | stop_number | line | starting_station_IBNR | city | zip | last_station | IBNR | long | lat | arrival_plan | departure_plan | arrival_change | departure_change | arrival_delay_m | departure_delay_m | info | canceled | info_present | transformed_info_message | clear_station_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -2065137557584893414 | 2407082237 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-08 22:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 1 | -2065137557584893414 | 2407092237 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-09 22:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 2 | -2065137557584893414 | 2407102237 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-10 22:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 3 | -2065137557584893414 | 2407112237 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-11 22:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 4 | -2065137557584893414 | 2407122237 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-12 22:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 5 | -2065137557584893414 | 2407132237 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-13 22:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 6 | -2065137557584893414 | 2407142237 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-14 22:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 7 | -3561454673811003901 | 2407082137 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-08 21:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 8 | -3561454673811003901 | 2407092137 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-09 21:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| 9 | -3561454673811003901 | 2407102137 | 1 | 29 | 8000001 | Aachen | 52064 | NaN | 8000001.0 | 6.091499 | 50.7678 | NaN | 2024-07-10 21:37:00 | NaN | NaN | 0.0 | 0.0 | NaN | True | False | No message | NaN |
| ID_Base | ID_Timestamp | stop_number | line | starting_station_IBNR | city | zip | last_station | IBNR | long | lat | arrival_plan | departure_plan | arrival_change | departure_change | arrival_delay_m | departure_delay_m | info | canceled | info_present | transformed_info_message | clear_station_name | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2785179 | 6234297817509604666 | 2407112012 | 2 | 70 | 8098360 | Bürstadt | 68642 | frankfurt-niederrad | 8002050.0 | 9.703958 | 47.552786 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | False | False | No message | NaN |
| 2785180 | 6234297817509604666 | 2407112012 | 3 | 70 | 8098360 | Bürstadt | 68642 | walldorf(hess) | 8006175.0 | 8.580811 | 50.001339 | 2024-07-11 20:24:00 | 2024-07-11 20:25:00 | 2024-07-11 20:25:00 | 2024-07-11 20:25:00 | 1.0 | 0.0 | Information | False | True | Information | NaN |
| 2785181 | 6234297817509604666 | 2407112012 | 4 | 70 | 8098360 | Bürstadt | 68642 | mörfelden | 8004065.0 | 7.666105 | 47.615929 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | False | False | No message | NaN |
| 2785182 | 6234297817509604666 | 2407112012 | 5 | 70 | 8098360 | Bürstadt | 68642 | groß gerau-dornberg | 8002386.0 | 8.494709 | 49.912279 | 2024-07-11 20:33:00 | 2024-07-11 20:34:00 | 2024-07-11 20:33:00 | 2024-07-11 20:34:00 | 0.0 | 0.0 | Information | False | True | Information | NaN |
| 2785183 | 6234297817509604666 | 2407112012 | 6 | 70 | 8098360 | Bürstadt | 68642 | riedstadt-goddelau | 8000126.0 | 8.489187 | 49.833230 | 2024-07-11 20:39:00 | 2024-07-11 20:39:00 | 2024-07-11 20:39:00 | 2024-07-11 20:40:00 | 0.0 | 1.0 | Information | False | True | Information | NaN |
| 2785184 | 6234297817509604666 | 2407112012 | 7 | 70 | 8098360 | Bürstadt | 68642 | stockstadt(rhein) | 8005740.0 | 8.125738 | 49.196735 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | False | False | No message | NaN |
| 2785185 | 6234297817509604666 | 2407112012 | 8 | 70 | 8098360 | Bürstadt | 68642 | biebesheim | 8000951.0 | 8.473978 | 49.781977 | 2024-07-11 20:45:00 | 2024-07-11 20:45:00 | 2024-07-11 20:45:00 | 2024-07-11 20:45:00 | 0.0 | 0.0 | Information | False | True | Information | NaN |
| 2785186 | 6234297817509604666 | 2407112012 | 9 | 70 | 8098360 | Bürstadt | 68642 | gernsheim | 8002249.0 | 12.242452 | 51.822244 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | False | False | No message | NaN |
| 2785187 | 6234297817509604666 | 2407112012 | 10 | 70 | 8098360 | Bürstadt | 68642 | groß-rohrheim | NaN | 12.078006 | 50.882649 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | False | False | No message | NaN |
| 2785188 | 6234297817509604666 | 2407112012 | 11 | 70 | 8098360 | Bürstadt | 68642 | biblis | 8000503.0 | 8.450413 | 49.688881 | 2024-07-11 20:56:00 | 2024-07-11 20:57:00 | 2024-07-11 20:56:00 | 2024-07-11 20:57:00 | 0.0 | 0.0 | Information | False | True | Information | NaN |